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A field programmable analog array (FPAA) is presented as an energy and computational 
efficiency engine: a mixed mode processor for which functions can be compiled at 
significantly less energy costs using probabilistic computing circuits. More specifically, 
it will be shown that the core computation of any dynamical system can be computed 
on the FPAA at significantly less energy per operation than a digital implementation. 
A stochastic system that is dynamically controllable via voltage controlled amplifier and 
comparator thresholds is implemented, which computes Bernoulli random variables. From 
Bernoulli variables it is shown exponentially distributed random variables, and random 
variables of an arbitrary distribution can be computed. The Gillespie algorithm is simulated 
to show the utility of this system by calculating the trajectory of a biological system 
computed stochastically with this probabilistic hardware where over a 127X performance 
improvement over current software approaches is shown. The relevance of this approach 
is extended to any dynamical system. The initial circuits and ideas for this work were 
generated at the 2008 Telluride Neuromorphic Workshop. 
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1. INTRODUCTION 

Due to the large computational efficiency gap that is theorized 
between classic digital computing and neuromorphic style com- 
puting, particularly in biological systems, this work seeks to 
explore the potential efficiency gains of a neuromorphic approach 
using stochastic circuits to solve dynamical systems. 

There is wide demand for a technology to compute dynam- 
ical systems much more efficiently with some recent examples 
being to calculate quantum equations to aid in the devel- 
opment of quantum computers or in the search for new 
meta-materials and pharmaceuticals using high computational 
throughput search methods for new meta-compounds. Standard 
digital computers — even super computers — have proven to be 
inefficient at these tasks limiting our ability to innovate here. 

Stochastic functions can be computed in a more efficient way 
if these stochastic operations are done natively in probabilistic 
hardware. Neural connections in the cortex of the brain is just 
such an example and occur on a stochastic basis (Douglas, 2008). 
The neurotransmitter release is probabilistic in regards to synapse 
firings (Goldberg et al., 2001) in neural communication. Most 
germaine to this work, many chemical and biological reactions 
occur on a stochastic basis and are modeled here via probabilistic 
circuits compiled on a reconfigurable field programmable analog 
array (Gillespie, 1976). 

Gillespie effectively showed that molecular reactions occur 
probabilistically and gave a method for translating a system of N 
chemical equations, normally specified by deterministic differen- 
tial equations, into a system of probabilistic, Markov processes. 
This will be the dynamic system described and computed herein. 
It has been shown that in a system with a sufficiently small 



number of molecules, the stochastic method is more accurate than 
its deterministic counterpart. It was further proven that in the 
thermodynamic limit (large number of molecules) of such a sys- 
tem, the deterministic and stochastic forms are mathematically 
equivalent (Oppenheim et al., 1969; Kurtz, 1972). 

Mathematical results will be extended from Gillespie's algo- 
rithm to show that any dynamical system can be computed using 
a system of stochastic equations. The efficiency in computing 
such a system is greatly increased by a direct, probabilistic hard- 
ware implementation that can be done in the analog co-processor 
we present. However, how to express a general dynamical system 
stochastically will not be discussed, only that a formulation exists 
that is more efficient when computed with natively probabilistic 
hardware. 

Several encryption algorithms and other on-chip solutions for 
a uniformly random number generator in hardware have been 
shown for microprocessors (Ohba et al., 2006). Generating static 
Bernoulli trials, where a 1 is generated with fixed probability p 
and 0 is generated with fixed probability 1 — p, were proposed 
using amplified thermal noise across digital gates and is also not a 
novel concept, but the work in which this concept was described 
was not fabricated, measured, or applied in hardware, and only 
existed in theory (Chakrapani et al, 2006). Static probabilities, 
or those with a fixed p -value, will not allow the performance 
gains that are possible in many stochastic systems because without 
dynamic p-values stochastic processes cannot be fully realizable in 
hardware. 

There has been a hardware solution proposed for dynamic 
Bernoulli trial generators, where the probability p can be 
dynamically reconfigured via reprogramming a floating gate 
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transistor (Xu et al., 1972). While this latter work illustrates a 
note -worthy solution and is a unique implementation to pro- 
vide dynamic probability adjustment, there is an overhead cost 
in terms of time to readjust the probability]? due to the nature of 
floating gate programming, which were not designed in that work 
for the continuous updates that are required for the applications 
presented here. 

The topic of generating stochastic variables with hardware 
circuits has also been addressed previously in Genov and 
Cauwenberghs (2001), but not in this manner. We make a con- 
tribution to the literature by showing that not only can we 
produce stochastic variables, but we can tune the probability of 
these stochastic variables in real time through a software control- 
lable input to the Bernoulli trial generator circuit via the FPAA. 
Further, we show how an array of these can be compiled on hard- 
ware and where outputs are input to a priority encoder to create 
an exponentially random variable for the first time known to the 
authors. Finally, this paper shows how the FPAA, with tunable 
stochastic variables which can be dynamically tuned in real time, 
results in significant performance gains of 127X for computing 
dynamical systems. 

In short, this paper will present several novel contributions 
including 

• A novel circuit for fast dynamic Bernoulli random number 
generation. 

• A compiled chaos circuit to generate environment independent 
probabilistic variables. 

• A novel circuit for fast dynamic exponentially distributed ran- 
dom number generation. 

• Analysis of the performance gains provided by the latter cir- 
cuits over current methods for Gillespie's algorithm that apply 
to many biological applications and stochastic applications in 
general. 

• Extension of the latter methods for applicability to any dynam- 
ical system and the result that all dynamic systems calculated 
stochastically require exponentially distributed random num- 
bers. 

• A method for going from concept to circuit measurements in 
2 weeks using a novel reconfigurable chipset developed in part 
by the authors. 

Section 2 introduces the technology behind building probabilistic 
function generators in hardware using thermal noise characteris- 
tics. Section 3 reviews implementation of dynamical systems in 
general and specifically Gillespie's Algorithm to give a context for 
why these circuits are important. Section 4 will review the chipset 
that was built in part by the authors and how it can used for 
faster stochastic computation. Section 5 will discuss the hardware 
results and experimental measurements. Section 6 will conclude 
the paper and discuss future directions. 

2. EXTREMELY EFFICIENT STOCHASTIC CIRCUITS 

Stochastic computation has shown to be a powerful tool to com- 
pute solutions to systems that would otherwise require complex 
continuous -time differential equations. However, the efficacy of 
stochastic methods are lost if complex computations are needed 
to produce the digital representation of these stochastic results. 



We present several circuits to solve these issues that can 
uniquely be compiled in analog technology, and thus our FPAA 
co-processor. 

2.1. A PROGRAMMABLE THERMAL NOISE CIRCUIT 

Thermal noise is a well-defined, well-behaved phenomenon that 
we show can be used as a computational resource within the FPAA 
fabric. This work will show that it can be used not only as a 
resource for random number generation, but for the generation 
of arbitrarily complex probabilistic functions. 

The current through a transistor, and hence the voltage at the 
drain or source node of a transistor, shows the probabilistic ther- 
mal noise effect as shown in Figure 1, and can be used as the basic 
building block of any probabilistic function generator. 

Thermal noise present in integrated circuits, also known as 
Johnson Noise, is generated by the natural thermal excitation of 
the electrons in a circuit. When modeled on a capacitive load, the 
root-mean-square voltage level of the thermal noise is given by 

Ut = J~^t where k is Boltzmann's constant, T is temperature, 
and C is the capacitance of the load. The likelihood of the voltage 
level of thermal noise is modeled as a Gaussian probability func- 
tion and has an equivalent magnitude throughout its frequency 
spectrum and is thus known as white Gaussian noise (Kish, 2002). 

To take advantage of the well-defined properties of ther- 
mal noise trapped on a capacitor, the circuit in Figure 2A was 
developed. This circuit can be broken down into the circuit 
components seen in Figure 2B. 

Thermal noise voltage on a 0.35 /xm process, which is the pro- 
cess size used for testing in this paper, with capacitor values in the 
range of C = 500 fF has RMS noise level of roughly 100 /xV. Even 
with a 100 mV supply, the thermal noise voltage that would cause 
a probabilistic digital bit flip is 1000a down from supply, giv- 
ing the probabilistic function designer limited options. Hence, in 
these experiments the thermal voltage signal was routed through 
two operational transconductance amplifiers (OTA) with gain A\. 
These circuits are shown in Figure 3. 

By routing the amplified thermal noise signal through a com- 
parator, and then comparing this signal to a user selectable voltage 
signal, a Bernoulli trial is created where the comparator outputs 
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FIGURE 1 | (A) The noise characteristics generated by a single transistor 
can be used as well-defined noise source. The current through and thus the 
voltage at the drain or source node of the transistor produces noise due to 
the thermal agitation of charge carriers. (B) Voltage measured from a 
transistor on 0.35 |xm test chip. 
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a digital "1" if the amplified thermal voltage signal is greater than 
the probability select and a "0" is output otherwise. A probability 
p can be set for the Bernoulli trial by setting the input voltage to 
the comparator such that it is less or more likely for a randomly 
varying thermal voltage to surpass this value, "Probability Select." 
The integral from the input voltage to Vdd of the thermal noise 
function is the probability of a digital 1, and the probability of 
a digital 0 is the integral from ground to the input voltage. This 
concept is illustrated in Figure 4. 

Note that the reconfigurable FPAA used to program this circuit 
has floating- gate controllable current biases such that they can be 
used to offset temperature effects. 

A Bernoulli random variable is generated with a probability p 
dynamically selectable by the user using these techniques. A useful 
property of Bernoulli trials is that an arbitrary probability distri- 
bution can be created when used in large numbers (as the number 
of Bernoulli trials N —> oo) they can be used to create an arbi- 
trary probability distribution. This phenomenon is illustrated in 
Figure 5. 

An exponential random variable is generated from a Bernoulli 
variable in the following way. X is defined here as the number 
of Bernoulli trials needed to produce a success, and this variable 
X is exponentially distributed. For example, to require six coin 
flips to produce a heads is exponentially less likely than to require 
two flips to get a head, since this is nothing more than a standard 
geometric sequence. The shape of this exponential distribution is 
controlled by the probability p of the Bernoulli trials. 



Pr(X = k) = (l-p) k - l p 



(1) 



<J 



FIGURE 2 | Circuit used to take advantage of the well-defined thermal 
noise properties on a capacitor where the root-mean-square (RMS) 
voltage of the noise on a capacitor is V n = yp£. The thermal noise on 
this capacitor is used as the gate voltage for PMOS and NMOS transistors 
where a current with the same spectrum as the thermal noise is generated. 
This ultimately produces a voltage controlled current through the 
diode-connected transistor. 



Figure 6 shows how these Bernoulli random variables are used 
to create an exponentially distributed number by being placed as 
inputs to a priority encoder. Recall that a priority encoder works 
by encoding the output to represent in binary which input was the 
first in priority order (from top to bottom for example) to be a 1. 
Also recall from Equation (1) that the number of Bernoulli trials 
needed to get a 1 is exponentially distributed. !The top Bernoulli 
trial in the figure is considered our first trial, the second from the 
top, our second trial etc. So the priority encoder in Figure 6 is 
encoding for us how many trials are needed to get a success (1), 
exactly our exponential distribution. 

Using these methods, an exponentially distributed random 
number can be generated in two clock cycles, a vast improvement 
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FIGURE 3 | Circuits for generating a Bernoulli random variable (1 with 
probability p and 0 with probability 1-p). (A) Dynamic Bernoulli 
Probability Circuit. Thermal noise is trapped on the capacitor, amplified 
twice through two operational transconductance amplifiers (OTA) with gain 
of Aj then put through a comparator with a probability select input. Note 
that these three OTA's are the same basic circuit programmed to different 
functionality. (B) Nine-transistor OTA used in (A) for amplifier and 
comparator circuits. 
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Thermal Noise Voltage (V) 

FIGURE 4 | The probability distribution function of thermal noise and 
the probability of generating a P(1) or P(0) in the Bernoulli variable 
generating circuit. The probability of a 1 is the integral under the 
probability distribution function of the thermal noise from the comparator 
voltage to the supply voltage of the circuit, V dd . 
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FIGURE 5 | A Bernoulli probability trial can be used to generate an arbitrary probability distribution with the correct transfer function. This result will 
be taken advantage of to greatly increase the performance of dynamical systems in this paper. 




FIGURE 6 | Illustration of how to transform N Bernoulli trials into an 
exponential distribution. Bernoulli trials put through a priority encoder as 
inputs results in an exponentially distributed probability function, where the 
shape of the exponential function can be tuned through the threshold 
inputs to the Bernoulli trials. 



over software and other current methods, which will be explained 
in the next section. 

2.2. PROGRAMMING BERNOULLI TRIALS AT TELLURIDE WORKSHOP 

All of the above circuits from the previous section were con- 
ceived of, designed, built, measured, and compiled in about in 
the 2 weeks of the 2008 Telluride Neuromorphic workshop. This 
accomplishment is both a testament to the productivity that the 
Telluride Neuromorphic workshop allows as well as a testament 
to the quick prototyping capability of the FPAA. 

The Telluride Neuromorphic workshop is unique in that some 
of the most brilliant minds in the country get together for an 
extended several week session dedicated to teaching, working with 
real hardware, and producing results such that students are com- 
pletely immersed in a world class engineering environment, day 
and night, for 2-3 weeks. 

The FPAA is analogous to the FPGA for logic circuits in that 
circuits can be conceived of, programmed onto the chip, and mea- 
sured in the same day. The FPAA has a mature toolset where 
an analog designer can conceive of a circuit, such as during a 
Telluride lecture session on analog design, and can simply cre- 
ate a netlist. The FPAA tools automatically take this netlist and 
optimally compile it to the fabric using a bitstream to program 
the on-board floating gate devices to set switches allowing net- 
works of active and passive devices, set current sources, bias 
currents, amplifier characteristics, and calibrate out device mis- 
match. A standard multi-meter is connected to the FPAA test 
board via built-for-test pinned out circuit leads. The multi-meter 



in this instance was connected back to the computer via GPIB 
that was producing the netlist to allow a full hardware in the 
loop programmable environment. Current toolsets are even more 
advanced allowing Simulink and other Simulink-like tools to 
build the circuit netlists. 

2.3. TEMPERATURE INVARIANT BERNOULLI TRIALS 

The thermal noise circuits used to create Bernoulli trials shown 
in the previous section have the well known side effect that 
their accuracy is highly dependent on temperature. And although 
methods such as adjusting bias currents with temperature are 
available on the FPAA, we present a temperature invariant 
method here to address this potential variability in the Bernoulli 
trials presented previously. These chaos circuits were built as 
follow-up work to the Telluride workshop. 

Chaos circuits were chosen to exemplify a more temperature 
invariant method. The model and explanation for the low-power 
chaos circuit used in this paper is first presented in Dudek and 
Juncu (2005). 

A chaos circuit works by introducing a seed value to a non- 
linear "chaos map" circuit which is itself chaotic. The sample and 
hold circuit then captures a continuous voltage value for con- 
sumption by a stochastic algorithm. The chaos map from Dudek 
and Juncu (2005) was chosen because of its proven results, but 
also because it only requires nine transistors and is extremely 
energy efficient. 

The resulting chaos map with a tunable control voltage to 
dictate the probability characteristics is shown in Figure 7. 

While further reading may be needed to understand the chaos 
circuit map shown in Figure 7, this map is very close to the results 
expected as shown in the literature. The general idea is that a 
given output voltage will result in a random assignment to the 
chaos map, allowing us to generate random variables in a tem- 
perature invariant way. The idea is that this chaos map could be 
used in place of the thermal noise circuits should the designer be 
concerned about temperature. 

These circuits all have something in common: they can be used 
to directly compute a stochastic function, cannot be compiled on 
a digital chip, and compute more efficiently than a digital system. 

Next we show how the usefulness of these circuits in a 
dynamical system. 

3. GILLESPIE'S ALGORITHM FOR STOCHASTIC 
COMPUTATION 

The previous findings are used to generate the results of a chem- 
ical and biological system using Gillespie's algorithm in this 
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FIGURE 7 | Measured y(x) voltage vs. V c for the chaos map circuit 
compiled on the FPAA. 



section. This section will also review the expense to calculate 
the trajectory of stochastic systems in software as a compari- 
son. Gillespie's algorithm is a natively probabilistic algorithm that 
takes advantage of the naturally stochastic trajectories of molecu- 
lar reactions (Gillespie, 1976), this algorithm is described below. 

3.1. GILLESPIE'S ALGORITHM 

1 . Initialize. Set the initial number of each type of molecule in the 
system and time, t = 0. 

2. For each reaction i, calculate the propensity function to obtain 
parameter value, a z -. 

3. For each i, generate a reaction time X\ according to an expo- 
nential distribution with parameter dj. 

4. Let \i be the reaction time whose time is least, r M . 

5. Change the number of molecules to reflect execution of reac- 
tion fi. Set t = t + Tfji. 

6. If initialized time or reaction constraints met, finished. If not, 
go to step 2. 

We use complexity analysis, or big- Oh, analysis to analyze the 
algorithms here where O(x) gives an expression, x, that represents 
the worst case running time of the algorithm. , Only algorith- 
mic improvements in software have been made to computing 
Gillespie's algorithm, until this work, such as Gibson et al. who 
have improved the running time of the algorithm from 0(Er) to 
0(r + Elogr) where E is the number of reaction events in the tra- 
jectory and r is the number of different reaction types (Gibson 
and Bruck, 1998). Several orders of magnitude improvement in 
energy efficiency and performance can be realized by comput- 
ing the exponentially distributed random variable r z - in hardware. 
Note that the big-Oh function does not change, just the way we 
implement this algorithm is much improved. 

The generation of the exponentially distributed random num- 
ber is the bottleneck of the algorithm, and the computational 
complexity of each step is calculated to show this. The metric used 
to judge computational cost is the number of clock cycles it takes 



to do a given calculation on a modern general purpose CPU as 
described in Patterson and Hennessy (2004). 

A load instruction is used to initialize a variable in Step 1. 
With the best case with a multiple data fetch scheme such as in 
the cell processor, this requires a single computational step. The 
propensity function a\ in Step 2 is calculated by a floating point 
multiplication (FPMUL), which takes five computational steps in 
a modern processor per reaction (Gillespie, 1976; Patterson and 
Hennessy, 2004). All r reactions assuming 8 FPMUL units are 
available takes y total computational steps. In Step 4, the mini- 
mum reaction time r M takes r — 1 compare operations. Assuming 
8 ALU (compare) units are available, Step 4 takes ^ compu- 
tational steps. Step 5 involves r — 1 integer addition/subtraction 
operations taking again ^ computational steps. Step 6 is an 
update in the program counter resulting in a single step. Step 3 
is a key step where each r z - according to an exponential distribu- 
tion. Generating an exponentially distributed random number is 
complex and deserves a bit more treatment. 

This is believed to be the first method to generate an exponen- 
tially distributed random number in hardware, and the random 
numbesr generated by other current methods is only pseudo- 
random. The Park-Miller algorithm on a modern advanced 
processor is the best known software method where a uni- 
formly pseudo-random number is generated in 88 computational 
steps (Park and Miller, 1998; Patterson and Hennessy, 2004). The 
equation to transform a uniformly distributed random variable 
U to one with an exponential distribution E with parameter X is 
shown in Equation (2). 



The natural logarithm function, In is extremely expensive, and 
even in the best case of computing this on a modern digital signal 
processor (DSP) takes 136 computational steps by itself accord- 
ing to Yang et al. (2002). Thus counting the FP multiply and the 
FP divide taking 5 steps and 32 steps, respectively (Patterson and 
Hennessy, 2004), it takes a total of 261 computational steps to 
generate a single exponentially distributed pseudo-random vari- 
able in software. Thus Step 3 alone takes 26 lr computational steps 
to generate X{ for all i reactions. To review the number of compu- 
tational steps for each part of the algorithm is shown below. 

3.2. COMPUTATIONAL STEPS IN GILLESPIE'S ALGORITHM 

Algorithmic step Computational steps 

(1 initialize. 1 

(2) Multiply to find each a;, y 

(3) Generate each r z -. 26 lr 

(4) Find V ^ 

(5) Update. ^ 

(6) Go to step 2 1 

Thus for a conservative value of 8 = 2, generating each expo- 
nentially distributed r; in Step 3 takes approximately 98% of the 
computational steps for a single iteration of Gillespie's algorithm. 
Seen in this light, the problem of improving exponential random 
variable generation becomes quite an important one. 
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3.3. EXPANSION TO ANY DYNAMICAL SYSTEM 

It has been shown in Gillespie (1976) and Kurtz (1972) that the 
trajectory of a chemical system consisting of N different reactant 
concentrations can be expressed as both a system of determinis- 
tic differential equations and a system of stochastic processes. For 
deeper understanding of Equations (3-5) that follows, the reader 
is encouraged to read these aforementioned references. This con- 
cept can be generalized to any dynamical system where the time 
evolution of a system of N state variables that has been described 
in a classical, state-based deterministic model as 

^=/i(*i,xa,x3,...) 

%=h(x 1 ,x 2 ,x 3 ,...) 



and in general as: 



X = F(X) 



(3) 



This system can also be expressed as a stochastic, Markov process. 

For the stochastic system, the probability that state variable x^ 
is updated during the next time interval r is assigned, P(r, [i)dx. 
More formally, this is the joint probability density function at 
time t expressing the probability that the next state update will 
occur in the differential time interval (t + r, t + r + dr) and that 
this update will occur to state variable x^ for \jl = 1 ... N and 
0 < r < oo. 

Given probability Po(t), the probability that no state-space 
update occurs in the interval (t, t+ r), and the probability, 
that an update to state x^ will occur in the differential time inter- 
val (t + r, t + r + dr) we have the general form for the joint 
probability function: 



P(r, \i)dr = Pq(t) • OL^dx 



(4) 



Note that is based on the state of the system X. Also note that 
determining is the critical factor in determining the Markov 
process representation and no general method for this is given 
here. In the chemical system example, a M is the probability that 
reaction is going to occur in the differential time interval and is 
a function of the number of each type of molecule currently in the 
system. The probability that more than one state update will occur 
during the differential time interval is shown to be small and thus 
ignored (Kurtz, 1972; Gillespie, 1976). Finally some function 
must be given describing the update to state variable x^ once 
an update occurs. We then have the stochastic, Markov process 
defined for the system: 



Pr[X(t + r + dr) = G(X(t)) | X(t)] = P(r, fi) 



(5) 



Note that this formulation does not make the assumption that 
infinitesimal dz need be approximated by a finite time step Ar, 
which is a source of error in many Monte Carlo formulations. 

To solve this system using computational methods, random 
numbers are generated according to the probability distributions 
described by Equation (5). No matter what dynamical system is 
involved, exponentially distributed random numbers will always 



be needed. To calculate Po(t) from Equation (4), we break the 
interval {t, t + r) into K subintervals of equal length e = r/K, 
and calculate the probability that no state update occurs in the 
first € subinterval (t, t + e) which is: 



Y[ [1 - <*# + o(e)] = 1-J2 a i € + 



(6) 



This probability is equal for every subinterval (t, t + 2e), (t, t + 
36), and so on. Thus the probability Po(t) for all K subintervals 
is: 



Po(r) 



lim 



= lim 



1 - ^aie + o{e 

i=l 

N 

1 - ^2<XiT/K+0(T/K) 



i = 1 



(7) 



(8) 



Where o(e) is the probability that more than one event occurs in 
the time interval 6. Following the analysis in Gillespie (1976), we 
assume as our incremental time interval goes to zero our func- 
tion o(e) —> 0 as well. With o(r/K) — > 0 we are left with the 
probabilistic, exponential function in Equation (9). 



P 0 (r) = exp 



(9) 



Thus we prove how this work can be extended to any dynamical 
system. 

4. RECONFIGURABLE ANALOG HARDWARE FOR 
STOCHASTIC COMPUTATION 

These complex probability functions are generated on a recon- 
figurable platform, reviewed in this section. More specifically, a 
dynamic Bernoulli Trial generator is illustrated on chip. This novel 
method involves using the reconfigurable analog signal proces- 
sor (RASP) chip that was recently introduced (Basu et al., 2008). 
This device allows one to go from concept to full functionality 
for analog circuits in a matter of weeks or even days instead of 
months or years for large-scale, integrated circuits. The design 
presented went from concept to measured hardware in a matter of 
2 weeks. The other useful feature is that many FPGA type archi- 
tectures allow a designer to build a subset of the possible circuits 
available with the RASP. Circuits such probabilistic function gen- 
erators could not be produced on strictly digital reconfigurable 
architectures although digital designs can be built on the RASP. 
The RASP chip is shown in Figure 8. 

4.1. STOCHASTIC CIRCUIT ARCHITECTURE 

The macro architecture and details of the algorithmic implemen- 
tation via Bernoulli trials and how this is built on the RASP chip 
is explored here. The RASP has 32 reconfigurable, computational 
analog blocks (CABs). The elements of a CAB and the elements 
that are used in this design are shown in Figure 9. 
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FIGURE 8 | Micrograph of the Reconfigurable Analog Signal Processor 
(RASP), also referred to the Field Programmable Analog Array (FPAA). 

The circuits were compiled onto this device. Computational Analog Blocks 
(CAB) populate the chip where both the computational elements and the 
routing is configured via floating gates. Buffers, capacitors, transmission 
gates, NMOS, PMOS, floating gates, current biases, adaptive amplifiers, 
and analog multiply arrays are all available and fully programmable on the 
chip. Development is shared by many in the CADSP group at Georgia Tech. 



An array of up to 32 Bernoulli trials can be calculated simul- 
taneously on a single RASP device. New versions of the RASP 
have been fabricated in 350, 130, and 45 nm; an entire family 
of RASP 2.9 chip variants exist for different applications spaces, 
which allow as much as 10X this number of Bernoulli trials, and 
this scales with Moore's law. The RASP chipset and accompa- 
nying tools also have the ability to be linked together easily for 
a multi-core RASP chipset should more Bernoulli generators be 
needed. The RASP chipset is useful for a proof of concept here. 
Since each Bernoulli generator only takes 30 transistors, many 
thousands of these circuits could be built in custom hardware if 
needed. 

5. CHIP MEASUREMENTS AND EXPERIMENTAL RESULTS 

To gather data, the Probability Select line described in Figure 3A 
and the output producing the random numbers were routed 
to the primary inputs/outputs of the chip. A 40-channel, 14- 
bit digital-to-analog-converter (DAC) chip was interfaced with 
the RASP chip on a printed circuit board, which we used as 
our testing apparatus, so that any arbitrary voltage could be 
input to the Probability Select line. This chip is 40 channel in 
the sense that there are 40 independent outputs of the DAC. 
The outputs of the RASP chip were connected to oscilloscope 
probes so that the noise spectrum and random numbers could be 
captured. 
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FIGURE 9 | The typical routing structure and available analog 
computational elements in one of the 32 computational analog blocks 
(CABs) present on the RASP chip. The elements used in the present 
design are highlighted in red. Alternating CABs have NMOS transistors and 
PMOS as the bottom CAB element, and although only three of the 
MOSFETs are used in our circuit from Figure 3, instead of the four that are 
circled, this is meant to show several different combinations of these four 
transistors can be used to make the three transistor circuit. The top triangle 
elements are an OTAs showing the two inputs and output being routed 
back into the fabric which is a mesh of lines with switches able to connect 
any two lines that cross in the mesh. The element below the OTAs is a 
capacitor and the bottom elements circled are NMOS and PMOS available 
to be routed. 



The distribution of the Bernoulli trial circuits were measured 
in the following way: 2500 random numbers were captured at 
each Probability Select voltage. The number of successes was 
divided by the total number of samples captured at each voltage 
to calculate the probability of a Bernoulli success (probability of 
randomly generating a 1). The results are shown in Figure 10. 

An example of a voltage signal from the Dynamic Bernoulli 
Probability Circuit producing a digital "1" with probability p = 
0.90 is shown in Figure 11. The voltage signal was recorded via 
on-chip measurement circuits and transmitted to a PC through a 
USB connection to the chipset. 

The array of Bernoulli trials was encoded and the exponential 
distribution of reaction times, r z -, was generated. The result- 
ing distribution is what one would expect and matches a true, 
exponential distribution as shown in Figure 12. 
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FIGURE 10 | Measurements from the output of the Dynamic Bernoulli 
Probability Circuit shown in Figure 3 were taken. The "Probability 
Select" voltage was adjusted from 0.1 volts up to 1.4 volts and the resulting 
probability of a digital "1 " being produced was recorded with a 95% 
confidence interval. Measurements were only taken down to p = 0.5 since 
a Bernoulli trial is symmetric about this value. 



5.1. VALIDATION OF RANDOMNESS 

A probabilistic output and a random output are differing con- 
cepts, and the ability to control this difference is the strength 
of the proposed circuits. They are linked together and defined 
through Shannon's entropy (Shannon, 1949). Formally, entropy 
and thus the randomness of a function are defined by H in 
Equations (10, 11). 
Let 

H n = -- J2 p(i,j,...,s)log 2 p(i,j,...,s) (10) 

i,j,...,s 

Then entropy is 

H= lim H n (11) 

where p(i,j, s) is the probability of the sequence of symbols 
i, j, s, and the sum is over all sequences of n symbols. 

By the same definition, a function exhibits the most random- 
ness if H is maximized, which occurs when all output sequences 
are equally likely, or equivalently, if all possible outputs have 
an equal probability of occurring (Shannon, 1949). From this 
work, a function is defined as random if all outputs have a uni- 
form probability of occurring. Conversely, we define a function 
as probabilistic if the function has an entropy 0 < H < log 2 n. 

There exist statistical measures of randomness, developed by 
the National Institute of Standards and Technology (NIST), in 
the form of a suite consisting of 16 independent tests to mea- 
sure how random a sequence of numbers truly is Random Number 
Generation and Testing (2014). However, these tests only measure 
the performance of random functions and not probabilistic ones 
such as the circuits presented in this work, although random 
number generation (when p = 0.5) is a subset function of these 
circuits. 
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FIGURE 11 | Voltage output from Dynamic Bernoulli Probability Circuit 
also called a random number generator (RNG) circuit as labeled in the 
graph, corresponding to the output out of the third (final) OTA circuit 
from Figure 3. A 1 volt offset was arbitrarily chosen for the comparator, but 
other voltage offset values than this were anecdotally observed to have 
undesired noise effects resulting in spurious switching at the output of the 
comparator. The noise amplifiers, and thus comparators, were observed to 
switch at approximately 208 ps as a maximum rate, thus independent 
Bernoulli variables could be produced at most at this time period in this 
particular device. A digital "1 " is produced with probability p = 0.90 and 
"0" with 1 - p = 0.10. When the voltage is above the threshold 
V ou t > Vdd 2 Vss it is considered a digital "1 " and otherwise a "0," where the 
threshold happens to be 0 volts in this case. The samples in this circuit 
change much faster than can be consumed, and thus random samples are 
taken from the output of this circuit at a slower rate than the rate at which it 
changes state, but preserving randomness. 



Each of the tests are measured on a scale from 0 to 1, where 
a "passing" mark is considered >0.93 and higher marks indi- 
cate a higher quality sequence. For a random output p = 0.5 
these circuits with thermal noise as the source of randomness 
passed all but the "Overlapping Template Matching Test" and 
the "Lempel-Ziv Complexity Test" and even these two tests 
received high marks >0.80. They also perform consistently better 
than the software generated, Park-Miller psuedo-random num- 
bers used by most algorithms, which failed half the tests in 
the suite with some failing badly <0.10 (Chakrapani et al, 
2006). 

6. CONCLUSIONS AND FUTURE DIRECTIONS 

It was shown in section 3 that to generate an exponentially dis- 
tributed random variable in software takes a minimum of 261 
computational steps with the Park Miller algorithm. And with 
hardware random number generators shown in previous micro- 
processor works, only uniformly random numbers were available. 
Bernoulli trials are generated here in hardware with a single com- 
putational step, and an exponentially distributed random number 
is generated with two computational steps. 

Because of the high gain of our amplifiers, the thermal noise 
distribution used to generate probabilistic distributions with our 
hardware is extremely sensitive to perturbations such as ambi- 
ent electrostatic interactions, device variations, and changes in 
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FIGURE 12 | The histogram (and thus a scaled probability distribution 
function) of the exponentially distributed Gillespie reaction times, t,'s, 



ambient temperature. Environment invariant chaos circuis were 
compiled and measured to mitigate these concerns. Because the 
Bernoulli trial circuits presented here can be controlled via a 
programmable input signal, software calibration can be done to 
mitigate these concerns as well. 

An estimated performance increase of approximately 130X is 
realized based on measured results to generate exponentially dis- 
tributed random numbers. With the assumption used in section 3 
that generating exponential random variables takes up 98% of the 
computation time of a single iteration through Gillespie's algo- 
rithm, our system could potentially speed up the calculation of 
the trajectory of this algorithm by approximately 127X. 

Further is was shown that these performance increases via 
hardware generated probabilistic distributions can be applied to 
any dynamical system and possibly have a much wider impact than 
the field of biological computations. 

Such a method to increase computational efficiency by two 
orders of magnitude is believed to be widely useful in calculat- 
ing biological, statistical, or quantum mechanical systems. The 
search for meta-materials new medicines, or any other host of 
applications could benefit. Future directions for this specific work 
include attempting to hardware accelerate quantum algorithms 
and venturing into the world of software defined analog radios. 
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